80 research outputs found
Large Social Networks can be Targeted for Viral Marketing with Small Seed Sets
In a "tipping" model, each node in a social network, representing an
individual, adopts a behavior if a certain number of his incoming neighbors
previously held that property. A key problem for viral marketers is to
determine an initial "seed" set in a network such that if given a property then
the entire network adopts the behavior. Here we introduce a method for quickly
finding seed sets that scales to very large networks. Our approach finds a set
of nodes that guarantees spreading to the entire network under the tipping
model. After experimentally evaluating 31 real-world networks, we found that
our approach often finds such sets that are several orders of magnitude smaller
than the population size. Our approach also scales well - on a Friendster
social network consisting of 5.6 million nodes and 28 million edges we found a
seed sets in under 3.6 hours. We also find that highly clustered local
neighborhoods and dense network-wide community structure together suppress the
ability of a trend to spread under the tipping model
Spatio-Temporal Reasoning About Agent Behavior
There are many applications where we wish to reason about spatio-temporal aspects of an agent's behavior. This dissertation examines several facets of this type of reasoning. First, given a model of past agent behavior, we wish to reason about the probability that an agent takes a given action at a certain time. Previous work combining temporal and probabilistic reasoning has made either independence or Markov assumptions. This work introduces Annotated Probabilistic Temporal (APT) logic which makes neither assumption. Statements in APT logic consist of rules of the form "Formula G becomes true with a probability [L,U] within T time units after formula F becomes true'' and can be written by experts or extracted automatically. We explore the problem of entailment - finding the probability that an agent performs a given action at a certain time based on such a model. We study this problem's complexity and develop a sound, but incomplete fixpoint operator as a heuristic - implementing it and testing it on automatically generated models from several datasets.
Second, agent behavior often results in "observations'' at geospatial locations that imply the existence of other, unobserved, locations we wish to find ("partners"). In this dissertation, we formalize this notion with "geospatial abduction problems" (GAPs). GAPs try to infer a set of partner locations for a set of observations and a model representing the relationship between observations and partners for a given agent. This dissertation presents exact and approximate algorithms for solving GAPs as well as an implemented software package for addressing these problems called
SCARE (the Spatio-Cultural Abductive Reasoning Engine). We tested SCARE on counter-insurgency data from Iraq and obtained good results. We then provide an adversarial extension to GAPs as follows: given a fixed set of observations, if an adversary has probabilistic knowledge of how an agent were to find a corresponding set of partners, he would place the partners in locations that minimize the expected number of partners found by the agent. We examine this problem, along with its complement by studying their computational complexity, developing algorithms, and implementing approximation schemes.
We also introduce a class of problems called geospatial optimization problems (GOPs). Here the agent has a set of actions that modify attributes of a geospatial region and he wishes to select a limited number of such actions (with respect to some budget and other constraints) in a manner that maximizes a benefit function. We study the complexity of this problem and develop exact methods. We then develop an approximation algorithm with a guarantee. For some real-world applications, such as epidemiology, there is an underlying diffusion process that also affects geospatial proprieties. We address this with social network optimization problems (SNOPs) where given a weighted, labeled, directed graph we seek to find a set of vertices, that if given some initial property, optimize an aggregate study with respect to such diffusion. We develop and implement a heuristic that obtains a guarantee for a large class of such problems
Strongly Hierarchical Factorization Machines and ANOVA Kernel Regression
High-order parametric models that include terms for feature interactions are
applied to various data mining tasks, where ground truth depends on
interactions of features. However, with sparse data, the high- dimensional
parameters for feature interactions often face three issues: expensive
computation, difficulty in parameter estimation and lack of structure. Previous
work has proposed approaches which can partially re- solve the three issues. In
particular, models with factorized parameters (e.g. Factorization Machines) and
sparse learning algorithms (e.g. FTRL-Proximal) can tackle the first two issues
but fail to address the third. Regarding to unstructured parameters,
constraints or complicated regularization terms are applied such that
hierarchical structures can be imposed. However, these methods make the
optimization problem more challenging. In this work, we propose Strongly
Hierarchical Factorization Machines and ANOVA kernel regression where all the
three issues can be addressed without making the optimization problem more
difficult. Experimental results show the proposed models significantly
outperform the state-of-the-art in two data mining tasks: cold-start user
response time prediction and stock volatility prediction.Comment: 9 pages, to appear in SDM'1
- …